Add new DLM Frozen Tier Transition execution plugin and service#144595
Add new DLM Frozen Tier Transition execution plugin and service#144595lukewhiting merged 17 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis change introduces a new x-pack plugin ( ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.OpenGrep is compatible with Semgrep configurations. Add an |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java`:
- Around line 79-81: The catch block in DlmFrozenTransitionExecutor (the
try/catch where logger.error(() -> Strings.format("Error executing transition
for index [%s]", indexName), t) is called) currently catches Throwable which
swallows Error subclasses; change it to catch Exception (e.g., catch (Exception
e)) so only recoverable exceptions are handled and logged via logger.error, and
allow Errors to propagate after any finally cleanup has run; keep the existing
finally/cleanup logic untouched so it executes before propagation.
In
`@x-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java`:
- Around line 93-97: The tearDown method currently calls super.tearDown() before
closing local resources which can leak; modify the
DlmFrozenTransitionServiceTests.tearDown method so that clusterService.close()
and terminate(threadPool) are executed first, then call super.tearDown() as the
last step to ensure local cleanup (clusterService and threadPool) happens before
superclass teardown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 826ca642-bb51-4e6e-bacb-32e50d28f6e9
📒 Files selected for processing (11)
server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.javax-pack/plugin/dlm-frozen-transition/build.gradlex-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.javax-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.javax-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionPlugin.javax-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionRunnable.javax-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.javax-pack/plugin/dlm-frozen-transition/src/main/plugin-metadata/entitlement-policy.yamlx-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozenMarkReadOnlyTests.javax-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutorTests.javax-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Outdated
Show resolved
Hide resolved
...sition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java
Show resolved
Hide resolved
…earch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
dakrone
left a comment
There was a problem hiding this comment.
Thanks for working on this Luke, I left some comments.
server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java
Outdated
Show resolved
Hide resolved
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Outdated
Show resolved
Hide resolved
| import java.util.concurrent.ExecutorService; | ||
| import java.util.concurrent.Future; | ||
|
|
||
| import static org.apache.logging.log4j.LogManager.getLogger; |
There was a problem hiding this comment.
This should be using org.elasticsearch.logging.LogManager, not log4j's
There was a problem hiding this comment.
Switched.
Out of curiosity, what's the history behind this? It seems like a pretty direct wrapper around slf4j and is used pretty sporadically through the codebase vs actual slf4j...
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Show resolved
Hide resolved
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Outdated
Show resolved
Hide resolved
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Outdated
Show resolved
Hide resolved
| thread.setDaemon(true); | ||
| return thread; | ||
| }); | ||
| schedulerThreadExecutor.scheduleAtFixedRate(this::checkForFrozenIndices, 0, pollInterval.millis(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
I remember there being a reason why we preferred not to use scheduleAtFixedRate, but unfortunately I can't remember the reason. Is there a reason you didn't go with the SchedulerEngine the way that the existing DLM service (and many of our services) does? I.e.,
Then you have the other bits abstracted away instead of dealing with the executor directly. (And logging in case anything goes wrong)
There was a problem hiding this comment.
It seemed a bit overkill for just triggering a single method on a fixed schedule but it's something I could be persuaded on if you think there are advantages?
There was a problem hiding this comment.
One nice thing about the SchedulerEngine is that its invocations are rescheduled only after work has done, rather than having to deal with overlapping triggers. It also automatically uses the EsExecutors.daemonThreadFactory instead of having to set it on the thread itself here. It also has a more robust .stop() method that does shutdown, waits, can timeout, interrupts, etc without having to do it yourself.
Overall I think it's nice to abstract that bit away into the SchedulerEngine class more than manage size-1 threadpools in places.
There was a problem hiding this comment.
One nice thing about the SchedulerEngine is that its invocations are rescheduled only after work has done, rather than having to deal with overlapping triggers.
I believe that's already the case with scheduleAtFixedRate. According to the docs:
If any execution of this task takes longer than its period, then subsequent executions may start late, but will not concurrently execute.
But I think the shutdown management stuff is worth it. I'll get that switched over.
There was a problem hiding this comment.
Yeah change of plan... I implemented this and it's made the class more complex 😅
It didn't simplify the shutdown / startup (actually made it 2 lines longer) plus it adds the requirement to extend an interface, register the class with the scheduler and the tests are more complex as it does not fire immediately on registration.
I really can't see an advantage of SchedulerEnginer over a modern ScheduledExecutorService... I'm guessing it's a holdover from before that existed? Perhaps I'm missing something else ES specific?
I'm going to stick with this for now. Long term, I think it will be easier for devs to maintain a standard JVM feature than something custom.
There was a problem hiding this comment.
What I have done is made use of our ThreadPools.terminate(...) on the close() methods. This will made the case of a node being asked to shutdown a bit more graceful will still using the force interrupt shutdownNow() for when the node is de-elected as master.
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
dakrone
left a comment
There was a problem hiding this comment.
I left some more comments, I realize a lot of this stuff is personal preference and subjective, but hopefully I can be convincing in some cases anyway :)
server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java
Outdated
Show resolved
Hide resolved
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Outdated
Show resolved
Hide resolved
| private final AtomicBoolean isMaster = new AtomicBoolean(false); | ||
| private volatile ScheduledExecutorService schedulerThreadExecutor; | ||
| private volatile DlmFrozenTransitionExecutor transitionExecutor; | ||
| private volatile boolean closing = false; |
There was a problem hiding this comment.
My personal preference is to call this "running" to avoid having to read it as "if not closing" below, but it is only a personal preference, no need to change if you prefer it as "closing"
There was a problem hiding this comment.
It seems I'm in a bit of a contrary mood on this PR but I'm going to disagree again 😅
We check both closing == true and closing == false in this class plus closing in a thing that is happening vs closed which is a state and we are tracking the former so I think it fits better?
Again, I'm not married to either, more laying out my logic.
| thread.setDaemon(true); | ||
| return thread; | ||
| }); | ||
| schedulerThreadExecutor.scheduleAtFixedRate(this::checkForFrozenIndices, 0, pollInterval.millis(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
One nice thing about the SchedulerEngine is that its invocations are rescheduled only after work has done, rather than having to deal with overlapping triggers. It also automatically uses the EsExecutors.daemonThreadFactory instead of having to set it on the thread itself here. It also has a more robust .stop() method that does shutdown, waits, can timeout, interrupts, etc without having to do it yourself.
Overall I think it's nice to abstract that bit away into the SchedulerEngine class more than manage size-1 threadpools in places.
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Outdated
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
bdee0af to
aae1e77
Compare
dakrone
left a comment
There was a problem hiding this comment.
LGTM, I left a few more comments but nothing that would require a re-review, thanks for the iteration on this!
server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java
Outdated
Show resolved
Hide resolved
...transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutor.java
Outdated
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Outdated
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Outdated
Show resolved
Hide resolved
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
...sition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java
Outdated
Show resolved
Hide resolved
...ition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionExecutorTests.java
Outdated
Show resolved
Hide resolved
…-frozen-task-executor-service
...-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java
Show resolved
Hide resolved
…earch/xpack/dlm/frozen/DlmFrozenTransitionService.java Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
* upstream/main: (146 commits) Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)" Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385) ESQL: Correctly manage NULL data type for SUM (elastic#144942) [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944) Fix reader context leak when query response serialization fails (elastic#144708) Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643) Merge main21 source set into main in simdvec (elastic#144921) [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848) [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539) Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795) Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595) Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416) Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766) Add test utility for wrapping directories in FilterDirectory layer (elastic#143563) Fix ES|QL decay tests with negative scale (elastic#144657) Fix circuit breaker leak in percolator query construction (elastic#144827) Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744) [DOCS] Document how reindex work in CPS (elastic#144016) Fix Int4 vector library tests failing on Java 21 (elastic#144830) [DiskBBQ] Fix index sorting on flush (elastic#144938) ...
…tic#144595) * Add new DLM Frozen Tier Transition executioplugin and service * Revert inspection profile change * Update x-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * PR changes * Fix flaky test * Move new tests to correct plugin * Fix plugin errors on startup * Improve thread pool termination and ensure proper clean-up during close * PR Changes * Update x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
…tic#144595) * Add new DLM Frozen Tier Transition executioplugin and service * Revert inspection profile change * Update x-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * PR changes * Fix flaky test * Move new tests to correct plugin * Fix plugin errors on startup * Improve thread pool termination and ensure proper clean-up during close * PR Changes * Update x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> # Conflicts: # x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java
…tic#144595) * Add new DLM Frozen Tier Transition executioplugin and service * Revert inspection profile change * Update x-pack/plugin/dlm-frozen-transition/src/test/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionServiceTests.java Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * PR changes * Fix flaky test * Move new tests to correct plugin * Fix plugin errors on startup * Improve thread pool termination and ensure proper clean-up during close * PR Changes * Update x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DlmFrozenTransitionService.java Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This PR adds a new X-Pack plugin for the DLM frozen tier and creates an execution framework for that transition.
Essentially there's a scheduler thread that checks for new indices with the "To transition" metadata maker every X minutes.
If it finds any, it checks if that index is already being transitioned and that there's available threads to execute the transition. If there is, it starts the transition on a thread in it's managed, fixed size thread pool. If not, the index is skipped if executing or all remaining indices are skipped if the thread pool is at capacity.
All this is gated behind a feature flag.
Also depends on #144511 to make Datas Streams plugin extensible / module exposed.
Fixes elastic/elasticsearch-team#2433